Skip to content

Range read block on rw conflict#229

Merged
githubzilla merged 22 commits intoeloqdata:eloq-10.6.10from
githubzilla:range_read_block_on_rw_conflict
Feb 11, 2026
Merged

Range read block on rw conflict#229
githubzilla merged 22 commits intoeloqdata:eloq-10.6.10from
githubzilla:range_read_block_on_rw_conflict

Conversation

@githubzilla
Copy link
Collaborator

@githubzilla githubzilla commented Feb 8, 2026

Summary by CodeRabbit

  • Tests

    • Added a test for range-split deadlock-abort handling, including write-intent downgrade and retry verification.
    • Added a test validating that reads block while a write lock is held during range-splits and complete after release.
    • Added test options to stabilize checkpoint timing.
  • Chores

    • Updated tracked submodule pointer (no functional API changes).

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds two new ELoQ MySQL tests (range_read_block_on_write_lock and range_split_deadlock_abort), their .opt and .result artifacts, and bumps the data_substrate submodule pointer; tests simulate range-split locking, fault injection (DEAD_LOCK_ABORT), and verify blocking/unblocking behavior and split completion.

Changes

Cohort / File(s) Summary
Range Read Block on Write Lock
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test, storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt, storage/eloq/mysql-test/mono_basic/r/range_read_block_on_write_lock.result
New test, option, and expected-result files. Seeds data to force a range split, injects a debug pause during prepare while holding a write-intent lock, spawns concurrent readers to assert they block, then releases the lock and verifies reads complete. Option adds --checkpointer_interval=86400.
Range Split Deadlock Abort
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test, storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt, storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result
New test, option, and result files. Injects DEAD_LOCK_ABORT at range-split phase 8, asserts DowngradeLock and retry behavior, waits for checkpoint-driven split progress, verifies data accessibility and no blocked queries. Option adds --checkpointer_interval=86400.
Submodule bump
data_substrate
Updated submodule commit pointer; no code changes in this diff.

Sequence Diagram(s)

sequenceDiagram
participant Client as Client
participant Conn as Connection
participant SE as StorageEngine
participant LM as LockManager
participant CP as Checkpointer
participant FI as FaultInjector

Client->>Conn: issue WRITE that triggers range-split prepare
Conn->>SE: start range-split phase 8 (acquire write-intent lock)
FI->>SE: inject DEAD_LOCK_ABORT at phase 8
Client->>Conn: spawn concurrent READs
Conn->>SE: READs attempt access — blocked by write-intent lock
CP->>SE: checkpoint advances (drives split path)
SE->>LM: handle DEAD_LOCK_ABORT → DowngradeLock (write→read)
LM->>SE: allow retry of phase 8
SE->>Conn: complete split and unblock reads
Conn->>Client: READs return results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • liunyl
  • thweetkomputer

Poem

🐇 I hopped through rows and nudged phase eight,
Locks held tight while readers learned to wait,
A fault then blinked and nudged a softer hand,
The split retried and queries cleared the land,
I twitched my whiskers — tests passed, all went great!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Range read block on rw conflict' directly summarizes the main objective of the pull request. The changes add new test scripts to verify range read blocking behavior when there are read-write conflicts during range split operations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Line 80: The test currently issues unbounded queries "select c0, c1, c2 from
t1 order by c0" (the --send at the first occurrence and the subsequent
reap/re-execute of "select c0, c1, c2 from t1"), which dumps ~2048 rows twice
into the .result file; replace those full-result queries with "select count(*)
from t1" to verify completion without dumping rows, or alternatively surround
the original sends/reaps with "--disable_result_log" / "--enable_result_log" so
the large outputs are not recorded; update both the initial --send statement and
the reap/re-verify re-execution occurrences to use one of these two approaches.
- Around line 130-135: The wait condition stored in $wait_condition is too
permissive because the "(info IS NULL and time >= 1)" branch can match unrelated
idle/daemon connections; tighten it so we only consider connections that
actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to
remove the generic "info IS NULL and time >= 1" clause and instead use just
"(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or
t1-related queries are considered by the wait condition.

In `@storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test`:
- Around line 65-70: The wait condition in $wait_condition is too broad because
`(info IS NULL and time >= 1)` matches idle connections (including the test's
own default connection); update the predicate in the select from
information_schema.processlist to exclude the current connection and only target
other idle sessions — e.g. replace `(info IS NULL and time >= 1)` with `(info IS
NULL and time >= 1 and id != CONNECTION_ID())` (keep the existing `info like
"select%from t1%"` check intact) so the count(*) = 0 check no longer blocks on
unrelated idle connections; locate and modify the $wait_condition definition to
apply this change.
🧹 Nitpick comments (5)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)

42-42: Hardcoded --sleep durations make the test slow and potentially flaky.

The test accumulates --sleep 5 + --sleep 2 + --sleep 2 + --sleep 10 + --sleep 5 = 24 seconds of fixed waits. On fast machines these are wasted time; on slow CI machines they may be insufficient. Where possible, prefer wait_condition loops (which already exist elsewhere in this test) over fixed sleeps. At minimum, the --sleep 10 on line 83 seems excessive for verifying that reads are blocked — a shorter sleep followed by the wait_condition on line 93 should suffice.

Also applies to: 50-50, 54-58, 83-83


67-70: Wait condition in Test Case 1 is overly broad.

The condition state like "%lock%" or info like "select count(*) from t1%" matches any connection with "lock" in its state (which could match unrelated connections) OR any connection running the matching query (which doesn't necessarily mean it's blocked). Consider combining the conditions with AND and also filtering by connection ID to be more precise.

Proposed tighter condition
 let $wait_condition=
     select count(*) >= 1 from information_schema.processlist
-    where (state like "%lock%" or info like "select count(*) from t1%")
+    where info like "select count(*) from t1%"
+    and state like "%lock%"
     and id != CONNECTION_ID();

93-99: Same broad condition pattern in Test Case 2.

Similar to Test Case 1, using OR between state like "%lock%" and the info matches weakens the assertion. This should require both a matching query and a lock-wait state to reliably confirm the reads are blocked. Also, the count threshold (>= 3) includes conn_read1 from Test Case 1 — confirm that conn_read1's query is still pending at this point (it should be, since the debug point is still active).

storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (2)

51-54: 15 seconds of fixed sleeps with no condition-based gating.

Same concern as the sibling test — --sleep 5 + --sleep 10 is slow on fast machines and fragile on slow CI. A wait_condition polling for evidence that the split reached phase 8 (e.g., checking a status variable or processlist state) would be more robust.


36-36: For consistency with the sibling test, use explicit column list instead of SELECT *.

The insert into t1 select * from t1; pattern works correctly in the Eloq engine, but the sibling test (range_read_block_on_write_lock.test) uses an explicit column list for better clarity:

insert into t1 (c1, c2, c3, c4) select c1, c2, c3, c4 from t1;

Consider adopting the same pattern here for consistency across the test suite.

Comment on lines +130 to +135
let $wait_condition=
select count(*) = 0 from information_schema.processlist
where ((state like "%lock%") or
(info IS NULL and time >= 1))
and (info like "select%from t1%" or info IS NULL);
--source include/wait_condition.inc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait condition in Test Case 5 may match unrelated idle connections.

The predicate (info IS NULL and time >= 1) combined with (info like "select%from t1%" or info IS NULL) will match any idle connection with NULL info and time >= 1, including daemon threads or the default connection itself (excluded only by the lock-state branch, not the NULL-info branch). This can cause the wait_condition to never reach count(*) = 0, leading to a timeout and false test failure.

Proposed fix: tighten the condition to only match t1-related queries
 let $wait_condition=
     select count(*) = 0 from information_schema.processlist
-    where ((state like "%lock%") or
-           (info IS NULL and time >= 1))
-    and (info like "select%from t1%" or info IS NULL);
+    where info like "select%from t1%"
+    and id != CONNECTION_ID();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let $wait_condition=
select count(*) = 0 from information_schema.processlist
where ((state like "%lock%") or
(info IS NULL and time >= 1))
and (info like "select%from t1%" or info IS NULL);
--source include/wait_condition.inc
let $wait_condition=
select count(*) = 0 from information_schema.processlist
where info like "select%from t1%"
and id != CONNECTION_ID();
--source include/wait_condition.inc
🤖 Prompt for AI Agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`
around lines 130 - 135, The wait condition stored in $wait_condition is too
permissive because the "(info IS NULL and time >= 1)" branch can match unrelated
idle/daemon connections; tighten it so we only consider connections that
actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to
remove the generic "info IS NULL and time >= 1" clause and instead use just
"(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or
t1-related queries are considered by the wait condition.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Around line 1-6: This test uses debug sync points via debug_dbug, so add
sourcing of have_debug.inc at the top of the test to skip on non-debug builds:
include/have_debug.inc should be sourced before any debug_dbug usage (i.e.,
before the first debug_dbug call in this file), and consider adding the same
include to the shared eloq include files so all tests that call debug_dbug will
skip when not built with debug support.
🧹 Nitpick comments (3)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)

42-42: Total of ~24 seconds of hard sleeps makes this test slow and potentially flaky.

The test accumulates --sleep 5 + 2 + 2 + 10 + 5 = 24 seconds of hard sleeps. The 10-second sleep at line 83 is particularly aggressive. While some sleep is unavoidable for debug-sync based tests, consider whether the wait_condition checks could replace some of these sleeps, or whether shorter sleeps suffice (especially lines 50 and 83).

Also applies to: 50-50, 58-58, 83-83, 106-106


105-105: Remove commented-out debug command.

This commented-out SET SESSION debug_dbug line appears to be a leftover from development/debugging. If it's intentionally kept as documentation, add a comment explaining why; otherwise remove it to keep the test clean.

 SET SESSION debug_dbug="-d,eloq;term_SplitFlushOp_CommitAcquireAllWriteOp_Continue;node_id=-1";
-# SET SESSION debug_dbug="+d,eloq;at_once;node_id=-1;action=NOTIFY_CHECKPOINTER";
 --sleep 5

120-125: Unnecessary full table re-scan at line 123.

After --reap collects the result from the --send on line 80, line 123 re-executes select c0, c1, c2 from t1 (without ORDER BY), scanning all ~2048 rows again for no verification purpose since result logging is disabled. Consider removing it or replacing with a lightweight check if needed.

 --connection conn_read3
 --disable_result_log
 --reap
-select c0, c1, c2 from t1;
 --enable_result_log
 commit;

Comment on lines +1 to +6
--source include/have_eloq.inc
set @@default_storage_engine= eloq;

--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find tests in the eloq directory that use debug_dbug
echo "=== Tests using debug_dbug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort

echo ""
echo "=== Tests sourcing have_debug.inc ==="
rg -l 'have_debug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort

echo ""
echo "=== Tests using both debug_dbug AND have_debug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do
  if rg -q 'have_debug' "$f" 2>/dev/null; then
    echo "$f"
  fi
done

echo ""
echo "=== Tests using debug_dbug but NOT have_debug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do
  if ! rg -q 'have_debug' "$f" 2>/dev/null; then
    echo "$f"
  fi
done

Repository: eloqdata/eloqsql

Length of output: 2890


🏁 Script executed:

#!/bin/bash
# Check if have_debug.inc exists anywhere in the repo
echo "=== Searching for have_debug.inc ==="
find . -name "have_debug.inc" 2>/dev/null

echo ""
echo "=== Checking eloq includes directory ==="
ls -la storage/eloq/mysql-test/mono_basic/include/ 2>/dev/null || echo "Include dir not found"

echo ""
echo "=== All .inc files in eloq includes ==="
ls storage/eloq/mysql-test/mono_basic/include/*.inc 2>/dev/null | head -20

echo ""
echo "=== Checking if have_debug.inc exists in main mysql test suite ==="
find . -path "*/mysql-test/include/have_debug.inc" 2>/dev/null

Repository: eloqdata/eloqsql

Length of output: 837


🏁 Script executed:

#!/bin/bash
echo "=== Content of have_debug.inc ==="
cat mysql-test/include/have_debug.inc

echo ""
echo "=== Content of have_eloq.inc ==="
cat storage/eloq/mysql-test/mono_basic/include/have_eloq.inc

echo ""
echo "=== First few lines of an eloq test file for comparison ==="
head -10 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.test

Repository: eloqdata/eloqsql

Length of output: 657


🏁 Script executed:

#!/bin/bash
echo "=== Check if eloq tests can source ../../../../../mysql-test/include/have_debug.inc ==="
head -5 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.test | grep -E 'source|include'

echo ""
echo "=== Check how many eloq tests source long_test.inc ==="
rg -l 'long_test.inc' storage/eloq/mysql-test/mono_basic/t/ | wc -l

echo ""
echo "=== All eloq tests and their includes ==="
for f in storage/eloq/mysql-test/mono_basic/t/*.test; do
  echo "=== $(basename $f) ==="
  head -10 "$f" | grep -E '^\-\-source'
done | head -50

Repository: eloqdata/eloqsql

Length of output: 1918


Source include/have_debug.inc to skip on non-debug builds.

This test uses debug_dbug (lines 20, 40, 104) which requires a debug build. Standard MySQL tests that rely on debug sync points source have_debug.inc to skip gracefully on non-debug builds. Without this, the test may fail unexpectedly if the debug sync points don't fire.

 --source include/have_eloq.inc
+--source include/have_debug.inc
 set @@default_storage_engine= eloq;

Note: This pattern applies to all 20 debug_dbug-using tests in the eloq suite, not just this one. Consider adding have_debug.inc to the shared eloq include files for consistency.

🤖 Prompt for AI Agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`
around lines 1 - 6, This test uses debug sync points via debug_dbug, so add
sourcing of have_debug.inc at the top of the test to skip on non-debug builds:
include/have_debug.inc should be sourced before any debug_dbug usage (i.e.,
before the first debug_dbug call in this file), and consider adding the same
include to the shared eloq include files so all tests that call debug_dbug will
skip when not built with debug support.

@githubzilla githubzilla force-pushed the range_read_block_on_rw_conflict branch from 4aef7f8 to 481ab8a Compare February 11, 2026 05:30
@githubzilla githubzilla merged commit a06a46c into eloqdata:eloq-10.6.10 Feb 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants